37 research outputs found
Bayesian Additive Regression Trees With Parametric Models of Heteroskedasticity
We incorporate heteroskedasticity into Bayesian Additive Regression Trees
(BART) by modeling the log of the error variance parameter as a linear function
of prespecified covariates. Under this scheme, the Gibbs sampling procedure for
the original sum-of- trees model is easily modified, and the parameters for the
variance model are updated via a Metropolis-Hastings step. We demonstrate the
promise of our approach by providing more appropriate posterior predictive
intervals than homoskedastic BART in heteroskedastic settings and demonstrating
the model's resistance to overfitting. Our implementation will be offered in an
upcoming release of the R package bartMachine.Comment: 20 pages, 5 figure
Matching on-the-fly in Sequential Experiments for Higher Power and Efficiency
We propose a dynamic allocation procedure that increases power and efficiency
when measuring an average treatment effect in sequential randomized trials.
Subjects arrive iteratively and are either randomized or paired via a matching
criterion to a previously randomized subject and administered the alternate
treatment. We develop estimators for the average treatment effect that combine
information from both the matched pairs and unmatched subjects as well as an
exact test. Simulations illustrate the method's higher efficiency and power
over competing allocation procedures in both controlled scenarios and
historical experimental data.Comment: 20 pages, 1 algorithm, 2 figures, 8 table
Statistical Analysis and Design of Crowdsourcing Applications
This thesis develops methods for the analysis and design of crowdsourced experiments and crowdsourced labeling tasks. Much of this document focuses on applications including running natural field experiments, estimating the number of objects in images and collecting labels for word sense disambiguation. Observed shortcomings of the crowdsourced experiments inspired the development of methodology for running more powerful experiments via matching on-the-fly. Using the label data to estimate response functions inspired work on non-parametric function estimation using Bayesian Additive Regression Trees (BART). This work then inspired extensions to BART such as incorporation of missing data as well as a user-friendly R package
bartMachine: Machine Learning with Bayesian Additive Regression Trees
We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and capable of handling both large sample sizes and high-dimensional data
Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation
This article presents Individual Conditional Expectation (ICE) plots, a tool
for visualizing the model estimated by any supervised learning algorithm.
Classical partial dependence plots (PDPs) help visualize the average partial
relationship between the predicted response and one or more features. In the
presence of substantial interaction effects, the partial response relationship
can be heterogeneous. Thus, an average curve, such as the PDP, can obfuscate
the complexity of the modeled relationship. Accordingly, ICE plots refine the
partial dependence plot by graphing the functional relationship between the
predicted response and the feature for individual observations. Specifically,
ICE plots highlight the variation in the fitted values across the range of a
covariate, suggesting where and to what extent heterogeneities might exist. In
addition to providing a plotting suite for exploratory analysis, we include a
visual test for additive structure in the data generating model. Through
simulated examples and real data sets, we demonstrate how ICE plots can shed
light on estimated models in ways PDPs cannot. Procedures outlined are
available in the R package ICEbox.Comment: 22 pages, 14 figures, 2 algorithm